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Abstract 



We estimate convex polytopes and general convex sets in M. d , d > 2 in the regression 
framework. We measure the risk of our estimators using a L -type loss function and 
prove upper bounds on these risks. We show that, in the case of polytopes, these 
estimators achieve the minimax rate. For polytopes, this minimax rate is which 
differs from the parametric rate for non-regular families by a logarithmic factor, and 
we show that this extra factor is essential. Using polytopal approximations we extend 
our results to general convex sets, and we achieve the minimax rate up to a logarithmic 
factor. In addition we provide an estimator that is adaptive with respect to the number 
of vertices of the unknown polytope, and we prove that this estimator is optimal in all 
classes of polytopes with a given number of vertices. 

Keywords : adaptive estimation, approximation, convex set, minimax, 
polytope, regression 

1 Introduction 

1.1 Definitions and notations 

Let d > 2 be a positive integer. Assume that we observe a sample of n i.i.d. pairs 
(Xi, Yj), i = 1, . . . , n such that X±,..., X n have the uniform distribution on [0, l] d and 



The collection X\, . . . , X n is called the design. The error terms £j, i = 1, . . . , n, are i.i.d. 
random variables independent of the design, G is a subset of [0, l] d , and /(• G G) stands 
for the indicator function of the set G. Here we aim to estimate the set G in Model (fll). 



Yi = l(Xi E <?) + £*,« 



= l,...,n. 




1 



A subset G n of [0, l] d is called a set estimator, or simply, in our framework, an estimator, 
if it is a Borel set and if there exists a real measurable function / defined on ([0, l] d x M) n 
such that /(• G G n ) = f(-,X 1 ,Y 1 ,...,X n ,Y n ). 

If G is a measurable (with respect to the Lebesgue measure on R d ) subset of [0, l] d , we 
denote by \G\d or, when there is no possible confusion, simply by \G\, its Lebesgue measure 
and by Fq the probability measure with respect to the distribution of the collection of n 
pairs (Xi, Y{),i = 1, . . . , n. Where it is necessary to indicate the dependence on n we use 
the notation P^ n . If G± and G2 are two measurable subsets of W 1 their Nikodym pseudo 
distance d\(G\,Gz) is defined as 

d 1 (G 1 ,G 2 ) = \G 1 AG 2 \. (2) 

Note that if G n is a set estimator and G is a measurable subset of [0, l] d , then the 
quantity \GAG n \ = Jj 1 j d \I(x G G n ) —I{x G G)\dx is well defined and by Fubini's theorem 
it is measurable with respect to the probability measure Fq- Therefore one can measure 
the accuracy of the set estimator G n on a given class of sets in the minimax framework : 
the risk of G n on a class C is defined as 

K n (G n ;C) = supE G [\GAG n \}. 

Gee 

For all the estimators that we will define in the sequel we will be interested in upper bounds 
on their risk, which give information about the rate at which these risks tend to zero, when 
the number n of available observations tends to infinity. For a given class of subsets C, the 
minimax risk on this class when n observations are available is defined as 

K n (C) = mfK n {G n ;C), 

where the infimum is taken over all set estimators depending on n observations. If 1Z n {C) 
converges to zero, we call minimax rate of convergence on the class C the speed at which 
1Z n (C) tends to zero. 

In this paper, we study minimax rates of convergence on two classes of subsets of 
[0, l] d : the class of all compact and convex sets, and the class of all polytopes with at most 
r vertices, where r is a given positive integer. Let C be a given class of subsets of [0, l] d . We 
aim to provide with lower bounds on the minimax risks on the class C. This lower bound 
can give much information on how close the risk of a given estimator is to the minimax risk 
on the class that we consider. If the rate (a sequence depending on n) of the upper bound 
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on the risk of a given estimator matches with the rate of the lower bound on the minimax 
risk on the class C, then this estimator is said to have the minimax rate of convergence on 
this class. 

We denote by p the Euclidean distance in M. d , by B^(y, r) the (i- dimensional closed 
Euclidean ball centered at y 6 W 1 with radius r, and by /3d the volume of the Euclidean 
unit ball in dimension ~K d . For any positive real number x, we denote by |_^J the greatest 
integer that is less or equal to x. Any convex set that we will consider in the following is 
assumed to be compact and with nonempty interior in the considered topological space. 

1.2 Former results and contributions 

Estimation of convex sets and, more generally, of sets, has been extensively studied in 
the previous decades (see the nice surveys given in Cuevas |5J and Cuevas and Fraiman [6] 
and the references therein, and related topics in p3]). First works, in the 1960's, due to 
Renyi and Sulanke [21] , [25] , and Efron [9] were motivated by issues of stochastic geometry, 
discussed, for instance, in the book by Kendall and Moran [2] and [1]. Most of the works 
on estimation of convex sets dealt with models different than ours. Renyi and Sulanke, [24], 
[25], were the first to study the convex hull of a sample of n i.i.d. random points in the 
plane. They obtained exact asymptotic formulas for the expected area and the expected 
number of vertices when the points are uniformly distributed over a convex set, and when 
they have a Gaussian distribution. They showed that if the points are uniformly distributed 
over a convex set K in the plane M 2 , then the expected missing area E[|if\^"|] of the convex 
hull K of the collection of these points is of the order 

- n -2//3 if the boundary of K is smooth, 

- r In n/n if K is a polygon with r vertices. 

This result was generalized to any dimension, and we refer to [2] for an overview. 

Estimation of convex sets in a multiplicative regression model has been investigated by 
Mammen and Tsybakov [20J and Korostelev and Tsybakov [TTJ. The design (X\, . . . ,X n ) 
may be either random or deterministic, in [0, l] d . In |20] Mammen and Tsybakov proposed 
an estimator of G when it is assumed to be convex, based on likelihood-maximization over 
an e-net, whose cardinality is bounded in terms of the metric entropy [8]. They showed, 
with no assumption on the design, that the rate of their estimator cannot be improved. 

The additive model ([TJ has been studied in [TB] and [T7] , in the case where G belongs 
to a smooth class of boundary fragments and the error terms are i.i.d. Gaussian variables 
with known variance. If 7 is the smoothness parameter of the studied class, it is shown 
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that the rate of the minimax risk on the class is n~' ) '> '(7+d-i), xhe case of convex boundary 
fragments is covered by the case 7 = 2, which leads to the expected rate for the minimax 
risk, as we will discuss later (Section 5) : n -2 /( d + 1 ) . It is important to note that in these 
works the authors always assumed that the fragment, which is included in [0, l] d , has a 
boundary which is uniformly separated from and 1 . We will not make such an assumption 
in our work. Cuevas and Rodriguez-Cazal [7j, and Pateiro Lopez [22], studied the properties 
of set estimators of the support of a density, under several geometrical assumptions on the 
boundary of the unknown set. 

One problem has not been investigated yet : how is the minimax rate of convergence 
modified if one assumes that the unknown set G, in model Q, is a polytope, with a bounded 
number of vertices ? This question can be reformulated in a more general framework when 
one deals with boundary fragments : what is the minimax rate of convergence if G is a 
fragment which belongs to a parametric family? In the method used in [16J and [17], the 
true fragment is first approximated by an element of a parametric family of fragments, 
whose dimension is chosen afterwards according to the optimal bias- variance tradeoff, and 
the proposed estimator actually estimates the parametric approximation of the fragment 
G, and not directly G itself. This idea is exploited in the present work, when we estimate 
convex sets, by using polytopal approximations. In the framework of fragments, the rate of 
convergence of the estimator when the target is the parametric fragment is found to be of 
the order M/n, where M is the dimension of the parametric class of fragments. Again, the 
assumption of uniform separation from and 1 is made. As we will show in the sequel, this 
assumption is essential in the parametric case, because if it is relaxed, an extra logarithmic 
factor appears in the rate. 

In order to estimate convex sets, we will first approximate a convex set by a polytope, 
and then estimate that polytope. There is a wide literature on polytopal approximation of 
convex sets (cf. [21], [10] . ...), which is of essential use in this paper. 

For an integer r > d + 1, we denote by V r the class of all polytopes in [0, l] d with 
at most r vertices. This class may be embedded into the finite-dimension space M. dr since 
any polytope is completely defined by the coordinates of its vertices. Therefore, one may 
expect that the problem of estimating G 6 P r , for a given r, is parametric and therefore 
a rate of the order 1/n for the minimax risk 1Zn(V r ), cf. [llj. In Section 2, we propose an 
estimator that almost achieves this rate, up to a logarithmic factor. Moreover, we prove 
an exponential deviation inequality for the Nikodym distance between the estimator and 
the true polytope. Such an exponential inequality is of interest because it is much stronger 
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than an upper bound on the risk of the estimator, and it is the key that leads to adaptive 
estimation, as we will see later. In Section 2, we show that this estimator has the minimax 
rate of convergence, so that the logarithmic factor in the rate is unavoidable. In Section 3, 
we extend the exponential deviation inequality of Section 2 in order to cover estimation of 
any convex set. In Section 4, we propose an estimator that is adaptive to the number of 
vertices of the estimated polytope, using as a convention that a non polytopal convex set 
can be considered as a polytope with infinitely many vertices. Section 5 is devoted to the 
proofs. 



2 Estimation of Convex Polytopes 

2.1 Upper bound 

We denote by Pq the true polytope, i.e. G = Pq in ([!]) and we assume that Pq G V r . 
We denote by Vr the class of all the polytopes in [0, l] d with at most r vertices with 
coordinates that are integer multiples of ^. It is clear that the cardinality of Vr is less 
than [n + l) dr . We have the following lemma, proved in Section 6. 

Lemma 1. Let r < n. For any polytope P in V r there exists a polytope P* £ Vr such 
that 

, K 2^mm (3) 

n 

(n) 

We estimate Pq by a polytope in Vr that minimizes a given criterion. The criterion 
that we use is the sum of squared errors 

n 

A(P,{(Xi,Yi)}i=i,..., n ) = X> " ZYi)I(Xi e P). 

1=1 

In order to simplify the notations, we will write A(P) instead of A(P, {{Xi, li)}i=i,...,n) in 
what follows. Note that if the noise terms £j, i = 1, . . . ,n, are supposed to be Gaussian, 
then minimization of A(P) is equivalent to maximization of the likelihood. 
Consider the set estimator of Po defined as 

€ argmin A(P). (4) 

Pev^ 

Note that since is finite, the estimator P^ exists but is not necessarily unique. 

Let us introduce the following assumption on the law of the noise terms £j, i = 1, . . . , n : 
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Assumption A. The noise terms i = 1, . . . , n, are subgaussian, i.e. satisfy the following 
exponential inequality : 

2 2 

E[e u ^} < e^,Vu G R, 
where a is a given positive number. 

Note that if the noise terms = 1, . . . , n, are i.i.d. centered Gaussian random va- 
riables, then Assumption A is satisfied. 

* (r) 

The next theorem establishes an exponential deviation inequality for the estimator P„ . 

* (r) 

Theorem 1. Let Assumption A be satisfied. For the estimator P„ , there exist two positive 
constants C\ and Ci, which depend on d and a only, such that : 

2dr In n 



sup F P 

P&Pr 



n(\PWAP\ - f^H) > x ]< Cl e- c * x ,Vx > 0. 
V Con J J 



The explicit forms of the constants C\ and C2 are given in the proof. From the deviation 

* (r) 

inequality given in Theorem 1 one can easily derive that the risk of the estimator Pfi on 
the class V r is of the order Indeed we have the following result. 

Corollary 1. Let the assumptions of Theorem 1 be satisfied. Then, for any positive number 
q, there exists a constant A q such that 



sup Ep 

P&Vr 



dr Inn 



q 



\P( r) AP\ q 

The explicit form of the constant A q can be easily derived from the proof. 

2.2 Lower bound 

Corollary 1 gives an upper bound of the order — for the risk of our estimator Pn ^ ■ 
The next result shows that ^ is the minimax rate of convergence on the class V r . 

Theorem 2. Assume that the noise terms £j, i = 1, . . . , n, are centered Gaussian random 
variables, with a given variance a 2 > 0. For every r > d+1, we have the following lower 
bound 

^ r, a « ~n a 2 a 2 Inn 
inf sup E P [|PAP|] > 



P PGPr ' n 



1 In 2 

where a = : — 0.29. 

2 21n3 
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Corollary 1 together with Theorem 3 gives the following bound on the class V r , in the 
case of Gaussian noise terms with variance a 2 : 

aV < - n n {V r ) < j-, 

Inn i_ e "3^ 

for n large enough and d + 1 < r < n. Note that the lower bound does not depend on the 
number of vertices r. This is because we prove our lower bound for the class Vd+i and we 
use that V r 5 Vd+i, for r > d + 1. 



3 Estimation of General Convex Sets 

3.1 Upper bound 

Let us denote by Cd the class of all convex sets included in [0, l] d . 

Now we aim to estimate convex sets in the same model, without any assumption of 
the form of the unknown set. If C is a convex set and G = C in model Q, an idea is to 
approximate C by a convex polytope. For example one can select r points on the boundary 
of C and take their convex hull. This will give a polytope C r with r vertices inscribed 
in C. In Section 2 we showed how to estimate such a r- vertex polytope as C r . Thus, if 
C r approximates well C, an estimator of C r is a candidate to be a good estimator of C. 
The larger is r, the better C r should approximate C with respect to the Nikodym distance 
defined in @. At the same time, when r increases the upper bound given in Corollary 1 
increases as well. Therefore r should be chosen according to the bias-variance tradeoff. 

For any integer r > d+1 consider again the estimator defined in Q. However, 
now we chose a value for r that depends on n in order to achieve the bias- variance tradeoff. 



d-l 

and let Pn^ the estimator defined in Q. Let Assumption A be satisfied. 



Theorem 3. Consider model (111) with G = C , where C is any convex subset of [0, l] d . Set 

, 4=1 

Ann/ 

Then, there exist positive constants C\,Ci and C3, which depend on d and a only, such 
that 

'C 3 lnn\ 2 ^ +1 ) 



sup J 

CeC d 



n 



p(r)AC\ - ( ) > x < C ie - c ^,Vx > 



n 



The constants C\ and C2 are the same as in Theorem 1, and C3 is given explicitly in 
the proof of the theorem. From Theorem 3 we get the next corollary. 
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Corollary 2. Let the assumptions of Theorem 3 be satisfied. Then, for any positive number 
q there exists a positive constant A' q such that 



sup Ec* 

CeC d 



\Pi r) AC\ q 



2q 

<4( — i -v»>i- 



The explicit form of A 1 can be easily derived from the proof. 



3.2 Lower bound 

In this section we give a lower bound on the minimax risk on the class Cd of all convex 
sets in [0, l] d . 

Theorem 4. Assume that the noise terms = 1, . . . ,n, are centered Gaussian random 
variables, with a given variance a 2 > 0. There exist a positive constant Cyj which depends 
only on the dimension d and on a, such that for any n > 125 and any estimator C , 



sup Ec 

CeC d 



\CAC\ 



> Cnri 



-2/(d+l) 



The explicit form of the constant Cyj can be found in the proof of the theorem. One 
can see that the lower bound given in Theorem 4 does not match the upper bound of in 
Theorem 3, where we had an extra logarithmic factor. Indeed we get that 



Cl7n -2/ (( m) < nn{Cd) < 3 
This gap is discussed in Section 5. 



Bi lnn\ d +! 



n 



4 Adaptive estimation 

In Section 2, we proposed an estimator that depends on the parameter r. A natural 
question is to find an estimator that is adaptive to r, i.e. that does not depend on r, but 
achieves the optimal rate on the class V r . The idea of the following comes from Lepski's 
method for adaptation (see [19], or [3j, Section 1.5, for a nice overview). Assume that the 
true number of vertices, denoted by r* , is unknown, but is bounded from above by a given 
integer R n > d+1 that may depend on n and be arbitrarily large. Theorem 1 would provide 

A (Ft ) 

the estimator P„ , but it is clearly suboptimal if r* is small and R n is large. Indeed the 
rate of convergence of Pn Rn ^ is Rn ln " , although the rate r lnra can be achieved according 
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to Theorem 1, when r* is known. The procedure that we propose selects an integer r based 

" (f) 

on the observations, and the resulting estimator is in . 

d-l 

Note that R n should not be of order larger than (j^) d+1 , since for larger values of r, 
Corollaries 1 and 2 show that it is more efficient to consider the class Cd than the class V r . 
Let us define : 

f = min jr G {d + 1, . . . , R n } : |pW AP^\ < Vr' = r, . . . , rA . 

The integer f is well defined, because the set in the brackets is not empty, since R n 
satisfies the condition. 

Let us define the adaptive estimator p£ dapt = p^ . y\fe then have the following theorem. 

Theorem 5. Let Assumption A be satisfied. 

Let R n = L(e^)^J and <j> n , r = min ^ ?lJ ^, (~)^\ for all integers r > d + 1 and 
r = oo. There exists a positive constant C§ that depends on d and a only, such that the 
adaptive estimator p® dapt satisfies the following inequality : 



sup sup Ep 

d+l<r<oo Pe"P r 



-l|p«dajrt A p| 



<c 5 , 



Vn > 1 , where Voo = Ca . 



Thus, we show that one and the same estimator p^ dapt attains the optimal rate simul- 
taneously on all the classes V r ,d + 1 < r < oo, and near optimal rate (optimal up to a 
logarithmic factor) on the class Cd of all convex subsets of [0, l] d . The explicit form of the 
constant C5 can be easily derived from the proof of the theorem. 



5 Discussion 



In Theorems 3 and 4, the upper and lower bounds differ by a logarithmic factor, and 
a question is which of the two bounds could be improved. Theorems 1 and 2 show that 
the logarithmic factor is significant in the case of polytopes. Is it still the case for general 
convex sets ? 

Let us first answer the following question : what makes the estimation of sets on a 
given class C C Cd difficult in the studied model? First, it is the complexity of the class. 
As introduced by Dudley [8], the complexity of the class quantifies how big the class is, 
or in more precise words, the number of elements that are needed in order to discretize 
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the class with a given precision. The more there are such elements, the more complex the 
class is, and the more complicated it is to estimate an unknown element of it. Second, it 
is how detectable the sets of the given class are, in our model. If the unknown subset G is 
too small, then, with high probability, it contains no point of the design. Conditionally to 
this, all the data have the same distribution and no information in the sample can be used 
in order to detect G. A subset G has to be large enough in order to be detectable by a 
given procedure. The threshold on the volume beyond which a subset cannot be detected 
by any procedure gives a lower bound on the rate of the minimax risk. In (12] , Janson 
studied asymptotic properties on the maximal volume of holes with a given shape. A hole 
is a subset of [0, l] d that contains no point of the design (X\, . . . , X n ). Janson showed that 
with high probability, there are convex and polytopal holes that have a volume of order 
In n/n. This result made it reasonable to think that In n/n should be the order of a lower 
bound on the minimax risk in Theorem 2 ; this is the idea that we use in the proof of this 
theorem. The lower bound is attained on the polytopes with very small volumes. We do 
not use the specific structure of these polytopes to derive the lower bound : we only use 
the fact that some of them cannot be distinguished from the empty set, no matter what 
is the shape of their boundary, when we chose their volume of order no larger than 
This shows that the rate 1/n, which would come from the complexity of the parametric 
class V r , is not the right minimax rate of convergence; the order In n/n, larger than 1/n, 
imposes its law on this class. On the other hand, the proof of our lower bound of the order 
n —2/(d+i) £ or g enera i convex sets uses only the structure and regularity of the boundaries ; 
we do not deal especially with small hypotheses. The order 

n -2/(d+l) ig 

much larger that 

In n/n, and therefore seems to determine the best lower bound achievable on the minimax 
risk on the class C^. 

Regarding this discussion we formulate two conjectures. 

Conjecture 1 We conjecture that the risk of our estimator could be more sharply boun- 
ded than in Theorem 1, i.e. that 

max ^) < K n mVr) < max ^) , 

\ n n J \ n n J 

for some positive constants Ai, A2, A3 and A4. If Inn is sufficiently larger than r, the right 
order of the minimax risk is If not, i.e. if the number of vertices of the unknown 
polytope can be large, the order of the risk is This lower bound is actually easy to prove 
when d = 2, using the same scheme as in the proof of the case d = 2 of Theorem 4. 
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Conjecture 2 Let uo be a given positive number. If one considers the subclass V' r (^o) = 
{P G V r : |P| > /io}j then subsets of [0, with too small volume are not allowed anymore. 
Therefore, the hypotheses used in the proof of Theorem 4 are not valid anymore and we 
expect the minimax rate of convergence on this class to be of the order 1/n. 

Remark 1. // Conjecture 1 is true, and if we keep our method for estimating general 
convex sets and follow the proof of Theorem 3, the bias-variance tradeoff leads to a choice 
for r of the order n l ^ d+l \ which is much larger than Inn. Therefore, the risk has the rate 
- = n~ 2 /( d+1 ) and the logarithmic factor is dropped. 



6 Proofs 

Proof of Theorem 1 Let Pq G Pr be the true polytope. Note that for all e > 0, 

P Po \\Pi r) AP | > el = Pp \3P G Vt ] : A(P) < A(P*), \PAP \ > el , (5) 

where P* is a polytope chosen in such that |P*\Po| < n ^ d > ©• For any P we 
have, by a simple algebra, 



A(P)-A{P*) = Y. Z * 



(6) 



i=l 



where 



Z, =I(Xi G P) — I(X t G P*) - 21 \X l G P ) [/(X, G P) - G P*)] 
- 2& [I(Xi G P) - /(Xi G P*)} , i = 1, • •■ ,n. 

The random variables Zi depend on P but we omit this dependence in the notation. 
Therefore ^ implies that 



iP^APol > e 



z E f po[E^^ 

Pe^ n) :|PAP |>e 



i=l 



(7) 



- E E p [ ex p 

PeP, (n) :|PAP |>e * =1 

for all positive number u, by Markov's inequality. Since Zj's are mutually independent, we 
obtain 



pW AP 1 > el < II E ^o [ ex P (~ uZ - 



(8) 



PeP^^IPAPol^e*- 1 
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By conditioning on X± and denoting by W = I{X\ G P) — I{X\ £ P*) we have 



Ep exp(— uZ\) 



Ep E Po exp(-nZi)|Xi 

E Po exp ( - uW + 2uI(X 1 G P )iy)E Po exp (2u£,iW) \X-\ 
E Po [exp ( - uW + 2u/(Xj e P )W) exp (2a 2 u 2 I{X 1 € PAP*)) 
exp (2ctV/(Xi G PAP*) - nl^ + 2u/(Xi G P )W) 



E 



(9) 



We will now reduce the last expression in ([9]). It is convenient to use the following table 
in which the first three columns represent the values that can be taken by the binary 
variables I(X\ G P), I{X\ G P*) and I(X\ G Po) respectively, and the last column gives 
the resulting value of the term exp (2a 2 u 2 I(X 1 G PAP*) - uW + 2uI(X 1 G Pq)W) that 
is under the expectation in ([9]). 



p 


p* 


Po 


Value 


1 


1 


1 


1 


1 


1 





1 


1 





1 


exp(2<j 2 u 2 + u) 


1 








exp(2<j 2 u 2 — u) 





1 


1 


exp(2<7 2 u 2 — u) 





1 





exp(2<j 2 u 2 + u) 








1 


1 











1 



Hence one can write 



E 



Po 



exp(— uZ\) 



1 - |PAP*| + e 



2a 2 u 2 +u, 



|(PnP )\P*| + |p*\(pup )|) 



(P*nP )\P| + |P\(P*uP )|). 



Besides by the triangle inequality, 



|PAP | < |PAP*| + |P*AP | 
12 



which implies 



E 



exp(— uZ\] 



\P \P*\ + \P*\P \) 



< 1 - |PAP | + |P*AP | + e 2a2u2+u 
^ 2u2 - u (\P \P\ + \P\Po\) 

^2a 2 u 2 +u^p* _| „2<j 2 u 2 — u 

2d d+l (3/2) d (3 d 



< 1 - |PAP | + |P*AP | + e Za " u " +u \P*AP \ + e^ u '~ u \PAP \ (10) 



< 1 - |PAP | 1 - e 



2a 2 u 2 -u 



+ 



I _|_ e 2a 2 u 2 +u 



Choose u = t^j. Then the quantity 1 — e 2<j2 " 2 u is positive and if |PAPq| > e, then 



E 



Po 



ex.p(—uZi) 



i \ 2d d+1 (3/2)<% / 3 
£ I - < ( 1 — e 4„2 j + w ; Pd ^1 + e ^ 



We set C\ = 1 + es^ and C*2 = 1 — e . These are positive constants that do not depend 
on n or Po. From ^ and (11), and by the independence of Z^s we have 

2d d+1 (3/2) d p d C 1 \n 



l^ r) AP | > e 



< 



E 



1 - C 6 e + 



P£Vi n) :\PAP \>e 



< ( n + l) dr (l-C 2 e + 



n 



2d d+l (3/2) d (3 d C 1 



n 



< exp (dr ln(n + 1) - C 2 en + 2d d+1 (3/2) d (3 d C 1 

< exp (2dr In n — C^en + Cs) , 



(12) 



where C\ = exp (2d d+l {2>/2) d l3 d C^), noting that n + 1 < n 2 . Therefore if we set e 
2f c 2 1 ° n n ^ or a positive number x, we get the following deviation inequality 



n 



P^APol 



2c?r In n 
C 2 n 



> x 



< Cie 



-C 2 x 



Proof of Corollary 1 Corollary 1 follows directly from Theorem 1 and Fubini's theorem. 

" (r) 

Indeed, if we denote by Z := \Pn APq| and by Fz its distribution measure, then Z is a 
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continuous and nonnegative random variable and we have, by Fubini's theorem, that 

E Po [Z q ] = q u q ~ l ¥ z [Z > u}du 
Jo 



<q 



< 



< 



< 3 



2dr In n 

u q ~ du + q 
2dr In n \ 9 

+ q 
+ q 



C 2 n 
2dr In n \ q 
C 2 n J 

2dr In n \ q 



x ' 2drlnn 
C 2 n 

2dr In n \ 9 1 

C 2 n J 
2dr In n ^ 



9-1 



2<ir In n 



C 2 n 



n Z 



2dr In n 



> nu 



u + 



C 2 n 

2dr In n \ q 

Con 



C 2 n 

+ C 1 qmax(l,2 q - 1 ) 



C 2 n 

Cie~ C2nu du by Theorem 1 
2dr lnn^ " 1 



du 
du 



v^ 1 + 



C 2 n 



C2TIV 



dv 



for n large enough. Note that the sixth step of this proof comes from the easy fact that for 
any positive numbers a and b, (a + 6) 9_1 < 2 q ~ 1 (a q ~ 1 + b q ~ 1 ) if q— 1 > 0, and (a + 6) 9_1 < 
a 9-1 + if q — 1 < 0, and the seventh comes from the fact that J °° v q ~ 1 e~ q dv = (q— 1)!. 



Proof of Theorem 2 This proof is a simple application of Corollary 2.6 in [26]. Let M 

be a positive integer, and /i = M+1 . Let fc = l,...,MbeM disjoint polytopes in Vd+i 
and with same volume : |Ti| = . . . = \Tm\ = h/2, where h = M _1 . 

For k = 1,...,M we denote by P& the probability distribution of the observations 
(X{, Yi),i = 1, . . . , n when G = T& in ([I]), and by the expectation with respect to this 
distribution. A simple computation shows that the Kullback-Leibler divergence -RT(Pfc,Pz) 
between P& and P/, for k ^ I, is equal to -p^. On the other hand, the distance between T/% 
and Ti, for A; 7^ I, is IT^AT;! = |Tfc| + \Ti\ = h. Then 

1 Mn/i n 

M + l^ ( 3 ' 0) = 4(M + l)a 2 " 4W 

Let a G (0, 1), and 7 = Then, if M = supposed without loss of generality to be 

an integer, we have 

9 Inn In Inn 
4a 2 aM In M = 2n + 2 In 7 2n > n 

n n 
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for n large enough, so that 



M 



1 M 



< a In M. 



Therefore, applying Corollary 2.6 in [26] with the pseudo distance defined in ([2]), we set 
for r > d + 1 the following inequality 



r | — 1 1 1 /ln(M + l)-ln2 

inf sup E P |PAP|> — — ( ^ 7/ n]. 

P PeP d+ i M + 1 



InM 



For n great enough we have M > 3 and ln ^^^ ln2 > 1 — |^| . We choose a = \ — G 
(0, 1). So, we get 

„ r ,A. a a a Inn a 2 a 2 lnn 

inf sup E P PAP > > — > > . 

p p & v d+1 ~ Af + 1 ~ 2M ~ ~ n 

This immediately implies Theorem 2. ■ 



Proof of Theorem 3 The idea of the proof is very similar to that of Theorem 1. Here we 
need to control an extra bias term, due to the approximation of C by a r- vertex polytope. 
We give the following lemma (cf. [10J ) . 

Lemma 2. Let r > d + 1 be a positive integer. For any convex set C C ~R. d there exists a 
polytope C r with at most r vertices such that 

where A is a positive constant that does not depend on r, d and C . 

Let P* be a polytope chosen in Vr such that|P*AC r | < - — — — , like in the proof of 
Theorem 1. Thus by the triangle inequality, 

|P*AC| < |P*AC r | + \C r AC\ < ^ + ( 4d ) d+1/?d 



r 2/(d-l) 



n 



We now bound from above the probability Fq |P^ r ^AC| > e for any e > 0. As in ^ and 
Q we have 

P c [|P^ r) AC| > e] < F C [3P G v£\A(P) < A(P*), \PAC\ > 



< 



A(P) < A(P* 
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Repeating the argument in ([6]) with C instead of Pq we set 

n 

A(P)-A(P*) = Y J Zu 

where 



i=l 



Zi =I(Xi G P) - I{Xi G P*) — 21 {Xi G C) [I(Xi G P) - I(X, G P*)] 
- 2£ [/(^ G P) - I(Xi G P*)] , i = l,...,n. 

The rest of the proof is very similar to the one of Theorem 1. Indeed, replacing Pq by C 



in that proof between Q and (10), and 



2rf rf + 1 (3/2) d ft i , 2ri d + 1 (3/2) d /3 d 
r). * r?. 



+ 37^11 m flUD ^d 



r 2/(d-l) 



( 12 ) one gets : 



|PWAC| > e 



< 



Pe^ n) :|PAC|>e 



< (n + l) dr ^1 - C 2 e + Ci 

< exp I 2dr In n — C^en + C\ 



r 2/(d-l) 



n 



Ad 2d d+1 (3/2) d p d 



r 2/(d-l) 

Adn 



n 

jd+i, 



r 2/(,-i)+ Md+i ( 3 / 2 )^ 

Therefore if we set e = 2 d Q^ n + c ^2/tt-i) + f for a positive number x, we get the following 
deviation inequality 



n 



|PWAC| 



2(ir In n 



C 2 n C 2 r 2 /(rf-i) 



> x 



< Cie- C2:E , 



where the constants are defined as in the previous section. That ends the proof of Theorem 
3 by choosing r = L(nf^) d+1 J> an d the constant C3 is given by 

d /„ q//s„2K .\ d _ 



(l + r' l .!)^=(l + (l + e 3/(8<T) )Aj T _ r 



1/(4(T 2 



Proof of Theorem 4 We first prove this theorem in the case d = 2 and then generalize 
the proof for d > 3. 

We more or less follow the lines of the proof of the lower bound in [18] (which is similar to 
the proof of Assouad's lemma, see [26]). Let G be the disk centered in (1/2, 1/2) of radius 
1/2, and P be a regular convex polygon with M vertices, all of them lying on the edge of 
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G. Each edge of P cuts a cap off G, of area h, with 7r 3 /(12M 3 ) < h < 7r 3 /M 3 as soon as 
M > 6, which we will assume in the sequel. We denote these caps by Di, . . . , Dm, and for 
any u = (ui, . . . , ojm) £ {0, 1} M we denote by G^ the set made of G out of which we took 
all the caps Dj for which ujj = 0, j = 1, . . . , M. 

For j = 1, . . . , M, and (wi, . . . , Wj-i, Wj+i, • • • , (*>m) £ {0, 1} M_1 we denote by 

u;^' ) = (ui, ... ,ojj-x,0,Uj + i, . . .,ojm) and by 
tjO'' 1 ) = . . . • • • ,w A /). 

Therefore note that for any j = 1, . . . , M, and (wi, . . . , Wj-i, Wj+i, • • • , wm) £ {0, 1} -1 , 

|G f u ,(j",o)AG f a; y,i)| = /i. 

For two probability measures P and Q defined on the same probability space and having 
densities denoted respectively by p and q with respect to a common measure v (we also 
denote by dP = pdv and dQ = qdis), we call H(¥, Q) the Hellinger distance between P and 
Q, defined as 

H(F,Q)= (J(VP-V9) 2 ) ■ 

Some useful properties of the Hellinger distance can be found in [26], Section 2.4. 

Now, let us consider any estimator G. For j = 1, . . . , M we denote by Aj the smallest 
convex cone with origin at (1/2, 1/2) and which contains the cap Dj. Note that the cones 
Aj,j = 1, . . . ,M have pairwise a null Lebesgue measure intersection. Then, we have the 
following inequalities : 



sup E G |~|GAG| 

1 £ 



> 



2 M 



|G W AG| 



JS{0,1}' 



M 



>2M E E E ^ \(G u nAj)A(GnAj 



we{o,i} M i=i 

M 



2 M 



52 52 Eg « \(G ul nA j )A(GnA j 



j=i we{o,i} M 
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M 



2 M 



E E-E 

j=X UJl,...,Ulj-l,U)j + l,...,UlM 



+ E G o,i) 



|(g£ j ' 0) nAj)A(Gn Aj)\ 
\(G^ n A,-)A(g n Aj 



(13) 



Besides for any j = 1, . . . , M and (u>i, . . . , w^+i, • • • , wm) £ {0, 1} M 1 we have 



E g (j-,o) 



KG^ni^Gni,-) 

| (G^' 0) n Ay) A(G nij), 



([0,l] 2 xR) n 



.0,0) 



+ 



([O.iPxR)* 1 



(14) 



> / (\{G^r\Aj)A(dnAj)\ + |(G^ 1) nA i )A(G'n^)|)min^ if " 

'([0,l] 2 xE)" V ' 



> / |(G#>°) n^)A(G^) n^)| min( ( iP^, 0) ,^. 1) ), 



by the triangle inequality 





([0,1] 2 


min(cflF 

xR)" 


^0,0) ) u 




(- 




m><Z>n \ 


2 




-\ 


( 1 








2 





2 



2/i 



(15) 



using properties of the Hellinger distance (cf. Section 2.4. in [26]). To compute the Hellinger 
distance between P„y,o) and we use the following lemma. 

Lemma 3. For any integer d > 2, if G\ and G2 are two subsets of [0, l] d , then 



H 2 (F Gl ,¥ G2 ) = 2(1 - e-^)\G AG 



il- 



ls 



Then if we denote by Cg = 1 — e , it follows from (13) and (15) that 

1 

Mh. 



sup E G 

Gec 2 



\GAG\ > ^.M.2 M ~ 1 .^{1 - C 9 h) 2n 



> 



(1-C 9 h) 



2n 



7T 



> — ^(1 - 7r 3 C 9 /M 3 ) 2n . 
- 12M 2V 9/ ' 

Besides, since we assumed that M > 6, we have that 

vr 3 C 9 /M 3 < vr 3 C 9 /6 3 = ^ (l - «p(-^)) < ~ < 1, 
and we get by concavity of the logarithm 

.3 



sup Eg 

Gec 2 



IGAGI 



vr / 4321n(l - vr 3 /216) (l - exp(-^)) nM~ 3 \ _ 2/3 
> » exp -5 22 — I > Cun ' , 



12M 2 



7T" 



if we take M = L^ 1 ^ 3 ] , where Cu = — exp 
sitive constant that depends only on a. This inequality holds for n > 216, so that M > 6. 



'4321n(l -vr 3 /216) (l - exp(-g^)' 



is a po- 



We now deal with the case d > 3. Let us first recall some definitions and resulting 
properties, that can also be found in |15j . 

Definition 1. Let (S,p) be a metric space and rj a positive number. 

A family y C S is called an r]-packing family if and only if p(y, y') > rj, for (y, y') G y 
with y 7^ y' . 

An rj-packing family is called maximal if and only if it is not strictly included in any other 
rj-packing family . A family Z is called an r\-net if and only if for all x G S, there is an 
element z G Z which satisfies p(x, z) < n. 

We now give a Lemma. 

Lemma 4. Let S be the sphere with center ao = (1/2, . . . , 1/2) G M. d and radius 1/2, and 
p the Euclidean distance in M. d . We still denote by p its restriction on S. 

Let rj G (0, 1). Then any rj-packing family of (S, p) is finite, and any maximal rj-packing 
family has a cardinality M v that satisfies the inequalities 

dV27T 



A d ~ 2 V2Trd 
2 d - 1 ^fd + 2rj d=1 ~ ±y±r] ~ 3( d - 3 )/V -1 ' 



< M n < 



(16) 
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Figure 1 - Construction of the hypotheses 

The construction of the hypotheses used for the lower bound in the case d = 2 requires 
a little more work in the general dimension case, since it is not always possible to construct 
a regular polytope with a given number of vertices or facets, and inscribed in a given ball. 
For the following geometrical construction, we refer to Figure 1. 

Let Go be the closed ball in R d , with center ao = (1/2, . . . , 1/2) and radius 1/2, so 
that Go C [0, l] d . Let rj G (0,1) which will be chosen precisely later, and {yi, • • • , 2/m„} 
a maximal 77-packing family of S = dG$. The integer satisfies (16) by Lemma 4. For 



j £ {1, . . . , M v }, we set by Uj = S n Bd(yj,rj/2), and denote by Wj the d — 2 dimensional 
sphere S n dBd(yj,r}/2). Let Hj be affine hull of Wj, i.e. its supporting hyperplane. Hj 



dissects the space M into two halfspaces. Let H- be the one that contains the point yj. 
For uj = (u>i, . . . , 0JM n ) ^ {0, 1} M,? , we denote by 

G. = G \( f| HJ). 

j=l,...,M v '-<^j=0 

The set G w is made of Go from which we remove all the caps cut off by the hyperplanes 
Hj, for all the indices j such that ujj = 0. 

For each j E {!,..., M^}, let Aj be the smallest closed convex cone with vertex ao = 
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(1/2, . . . , 1/2) that contains Uj. Note that the cones Aj,j = 1, . . . , M v have pairwise empty 
intersection, since Go is convex and the sets Uj are disjoint. We are now all set to reproduce 
the proof written in the case d = 2. Note that 

|G wtfl o)AG wCl -,i)| = |(G w( i,o) D Aj-)A(G uW ,i) n , 
for all uj G {0, l} Mr > and j G {1 . . . , M„}, and this quantity is equal to 



4 



\B d -i(0, \/r- r 2 )\d-idr, 



since as mentioned before rf /A is the height of the cap cut off by Hj, or in order words the 
distance between yj and the hyperplane Hj, independent of the index j. Therefore, 



f .nl 

|G (1> o,o)AG w(il i)| = / 4 l-Bd-iCO, Vr-r 2 )\ d ^dr 

.In 

^(r-r^-^dr 



,. n 

4 



/3 d _i / (r - r^-^dr 



Pd-irf 



,d+l r l 



4d+i 



(d-l)/2 



2 \ (d-l)/2 

1-^Y du. 



Since < r/ 2 /4 < 1/4, we then get 



23d(d + i) - I^uojAg^u,,)! < 22d+1(d + 1) 



(17) 

Now, continuing (13) and (15), replacing M by and /i by the lower bound in (17) and 
using lemmas 3 and 4, we get that 



sup Kg 

GGC d 



\GAG\] >C s r, 2 (l-Cw d+1 ) 2n , 



(18) 



where 



and 



8 2 4rf + 1 (d + l)Vd + 2 

(1 - e~e**)Pd-i 
9 2 M +!(d+l) ' 
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Note that since the ball i^_i(0, 1/2) is included in the (d — l)-dimensional hypercube 
centered at the origin, with sides of length 1, the following inequality holds 

1^-1(0,^)1 = ^ < 1, 
and this shows that Cg < 1. Therefore, since ij < 1 as well, the concavity of the logarithm 

sup E G [|GAG|1 > CWexp (2nln(l - C 9 )r] d+1 " 
Gec d 1 J v 



leads (18) to 



Let us choose rj = n 1 /( ci + 1 ), so that (18) becomes 



sup Eg 

GeC d 



\GAG\ 



> Ci n d +! , 



where C w = C 8 (l - C 9 ) 2 > 0. 



Proof of Theorem 5 Let r* be a given and finite integer such that d H- 1 ^ -^n* 
Note that if r* < r < r' , then V r * C P r C P r ,. Therefore if P G P r . and G = P in 
model ([!]), by Theorem 1 it is likely that with high probability we have, using the triangle 
inequality : 



pW/\p( r ')i < 



Cdr' Inn 



n — n 



(19) 



for any r* < r < r' , where C is a constant. Therefore it is reasonable to select r as the 



minimal integer that satisfies (19). 



Let r be chosen as in Theorem 5. For r = d + 1, . . . , R n , let us denote by A r the event : 
A r = lw = r,...,R n ,\P^APP\< 6dr ' lnn 



C 2 n 

where C 2 is the same constant as in Theorem 1. Then f is the smallest integer r < R n such 
that A r holds. 

Let P £ V r * . We write the following : 

Ep[\Pf apt AP\] = E P [\Pf apt AP\I(r < r*)\ + E P [|P r f apt AP\I(f > r*)], (20) 

and we bound separately the two terms in the right side. Note that if f < r* , then, since 
the event A? holds by definition, 



\P^AP^\ < 



6dr* In n 
C 2 n ' 
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Therefore, using the triangle inequality, 



E P [\P% dapt AP\I{r < r*)] < E P [\Pf apt AP^\I(f < r*)} + E P [\Pi r ^ AP\I(f < 



6dr*lnn A\dr*\iin 
< — - 1 by Corollary 1 



< 



C 2 n 
C\\r* Inn 



n 



(21) 



where C\\ depends only on d and a. The second term of (20) is bounded differently. First 
note that for all r = d + 1, . . . , R n , P^ C [0, l] d , so \P^ r) \ < 1. Thus, if stands for the 
complement of the event A r *, we have the following inequalities. 



E P [\Pf apt AP\I{f > r*)} < 2¥ P [r > r*\ 

Rn 

< 2 f p 



p(r*) a p(r)i 



> 



Qdr In n 

C 2 n 



< 



Rn 
Rn 

r=r* 

Rn 

r=r* 



iPi^API + |Pi r) AP| > 



6dr In n 



C 2 n 



\PPAP\ > 
\P^AP\ > 



3dr In n 
C 2 n 

3dr* In n 
C 2 n 



+ : 



P^AP\ > 
\p( r) AP\ > 



3dr In n 

C 2 n 

3dr In n 

C 2 n 



(22) 



Note that since P 6 P r *, it is also true that P G P r ,Vr > r*. Therefore, by Theorem 1, 



using first x = dr * } n n , then x 



C 2 



rfr In ?i 
C 2 ' 



it comes from (22) that : 



Rn 



E P [\P% dapt AP\I(f > r*)] < 2 Y (Cie- dr * lnn + C x e 



-dr In n 



< ACiR n n 

< ACiR n n- di - d+ V 

< 4Ci 



— 1 ^ n d(d+1) 
Inn 



(23) 



23 



Finally, using (I2TI) and p3|) 



n 

where C12 is a positive constant that depends on d and a. Let us now assume that r* is 
a given integer larger than R n , possibly infinite, and that P G V r *. As in Theorem 5, if 
r* = 00 we denote by Poo the class C d - Then with probability one, f < r* . First of all, note 
that obviously, since by definition, f < R n , 

2 

QdR r ,lnn Qd /lnn\ d + 1 



\p(Rn) A p{?)\ < 6dR n lnn ^ 6d /lnnV 
n n — q^ u — ^ y n J 



Then, by the triangle inequality, 



Ep[\Pf apt AP\} < ^ ( — ] d+1 + Ep[\Pi R ^AP 

c 2 V n ) 

2 2 

/lnnX^+i^^, /lnn\3+i 



< — 

by Corollary 2, since P S V r * C Poo an d Pn Rn ^ is the estimator of Theorem 3. Theorem 5 
is then proven. □ 

7 Appendix : proof of the lemmas 

Proof of Lemma 1 Let us first state the following lemma, which gives the Steiner 
formula in the case of polytopes. It can also be found in |3j. 

Lemma 5. For any polytope R C M rf the volume of R^ is polynomial in X, with degree d, 
that is there exists (L (R), ■ ■ . , L d (R)) G W l+1 

d 

\R X \ = ^L k {R)\ k , VA > 0. 

k=0 

Besides, Lq(R) = \R\, L\(R) is the surface area of R and Ld(R) = |P<z(0, 1)|, independent 
of R, and all the Li(R), i = 0, . . . , d are nonnegative. 

Note that in this lemma, if R is included in Bd(a,u) for some a G 1R and u > 0, then 
for all positive A, 

R X C B d (a, u) x = B d (a, u + X) 
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and if we denote by f3 d = \B d (0, 1)|, 

d 

\R x \ = Y,L k (R)X k <(u + \) d p d . (24) 

fc=0 

Therefore, since all the Lt(R) are nonnegative, one gets 

Li(R) <(u + l) d f3 d ,i = l,...,d (25) 



by taking A = 1 in ( 24 ) 



Let r < n, and P £ P r . The polytope P* is constructed as follows. For any vertex x 
of P, let x* be the closest point to x in [0, l] d with coordinates that are integer multiples 
of ^ (if there are several such points x* , then one can take any of them). The euclidean 
distance between x and x* is bounded by 

Let us define P* as the convex hull of all these resulting x* . Then P* 6 ■ 

For any set GC1' ! and e > we denote by G £ the set 

G e = G + eB d (0, l) = {x€R d : p{x, G) < e}. 
It is clear that the Hausdorff distance between P and P* is less than — . Therefore if we 

n 

denote e = ^we have P* C P e and P C (P*) e . 



Since the two polytopes P and P* are included in B d (a, ^ ), for a = (1/2, . . . , 1/2), 



one gets from (25) that 



L i (R)<(f + i)% d <( 3 f] d ,i = o,...,d 



for P = P or P*. 

We can now bound the Nikodym distance between P and P* 



|PAP*| = |P\P*| + |P*\P| < \(P*) e \P*\ + |P e \P 

□ 



3Vd\ d a ( y/d\ k 2d d+1 {3/2) d fi d 



fc=i 



< 

n / n 
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Proof of Lemma 3 First note that if G C [0, l] d , then the density of the probability 
measure with respect to the Lebesgue measure on [0, l] d x R is 

PG{x,y) = -j===e 
Therefore, by a simple algebra, if G\ and G2 are two subsets of [0, l] d , then 



V VGx (x, y)VG 2 (x, y)dxdy 

[0,l] 2 x» 

[o,i] 2 P V ^ ) 

= |GiAG 2 |e"^ + 1 - |GiAG 2 |, 

and Lemma 3 follows from [26] , Section 2.4. □ 

Proof of Lemma 4 The fact that any repacking family of (S, p) is finite is clear and 
comes from the fact that S is compact. Consider now a maximal repacking family of 
(S,p), denoted by {yi, . . . , 2/m„}- The surface area of Bd(yj,rj/2) n S is independent of 
j £ {1, . . . ,-ZVf^}, and we denote it by V(r//2). A simple application of the Pythagorean 
theorem shows that B^yj, r//2) n S is a cap of height ri 2 /4 of S. Therefore, using Lemma 
2.3 of [23] 

/ 2x(d-3)/2 

V( V /2) > ^1 - r^ 1 . 

Besides, since {yi , . . . , yM„ } is an repacking family , the caps Ba(yj , r//2) n S, j = 1, . . . , 
are pairwise disjoint and the surface area of their union is less than the surface area of S, 
which is equal to ^d-i ' so we 

¥(1/2) < ^k- 

Therefore, 



and the right inequality of Lemma 4 follows from the fact that ry 2 /4 < 1/4 and Lemma 2.2 
of 1 23 1 which states that 



< ~p- < ^-j=-. (26) 



Vd + 2 Pd-l Vd 
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The left inequality of Lemma 4 comes from the fact that any maximal //-packing family 
is an 7^-net. Indeed, consider a maximal repacking family y, and assume it is not an rj- 
net. Then there exists x G S such that for all y £ y, p(x, y) > e. Therefore {x} U y is 
an 77-net that contains y strictly. This contradicts maximality of y. Therefore the family 
{yi, . . . , yM v } is an r/-net of S, and the caps Bd(yj,r]) D S, j = 1, . . . , M v cover the sphere 
S, so that 

Mr,V( V ) > 

Using again Lemma 2.3 of [23], we bound V(rj) from above 



and then the desired result follows again from (26). □ 



References 

[1] Alon N., Spencer J. H. (2008) The probabilistic method, Wiley 

[2] Barany, I. (2007) Random Polytopes, convex bodies, and approximation, in Baddeley, 
A. J., Barany, I., Shneider, R., Weil, W. (2004) Stochastic Geometry, C.I.M.E. Summer 
School, Martina Franca, Italy (W. Weil, ed.), Lecture Notes in Mathematics 1892, pp. 
77-118, Springer, Berlin 

[3] Berger M. (1978) Geometrie, Tome 3 : Convexes et polytopes, polyedres reguliers, aires 
et volumes, Ed. Cedic/Fernand Nathan 

[4] Chichignoud M. (2010) Performances statistiques d'estimateurs non-lineaires, 
These de Doctorat, Universite de Provence U.F.R. M.I.M., available at 
http ://stat.ethz.ch/people/michaech/PDF_thesis 

[5] Cuevas A. (2009) Set estimation : Another bridge between statistics and geometry, 
Boletm de Estadistica e Investigacion Operativa, Vol. 25, No. 2, pp. 71-85 

[6] Cuevas A., Fraiman R. (2010) Set estimation, in New Perspectives on Stochastic Geo- 
metry, W.S. Kendall and I. Molchanov, eds., pp. 374-397 Oxford University Press 

[7] Cuevas, A., Rodriguez-Casal, A. (2004). On boundary estimation. Adv. in Appl. Pro- 
bab. 36, pp. 340-354. 

[8] Dudley R. M. (1974) Metric Entropy of Some Classes of Sets with Differentiable Boun- 
daries, Journal of Approximation Theory 10, 227-236 



27 



[9] Efron B. (1965) The Convex Hull of a Random Set of Points, Biometrika, Vol. 52, No 
3/4, pp. 331-343 

[10] Gordon Y., Meyer M., Reisner, S. (1995) Constructing a Polytope to Approximate a 
Convex Body, Geometricae Dedicata 57 : 217-222, 1995 

[11] Ibragimov I. A., Khasminiskii R. Z. (1984) Statistical Estimation : Asymptotic Theory, 
New York : Springer- Verlag 

[12] Janson S. (1987) Maximal spacings in several dimensions, The Annals of Probability, 
Vol. 15, No. 1, pp. 274-280 

[13] Kendall W.S., Molchanov I. (2010) New Perspectives in Stochastic Geometry. Eds., 
Oxford University Press 

[14] Kendall M. G., Moran P. A. P. (1963) Geometrical Probability, Griffin's Statistical 
Monographs and Courses, no. 10 

[15] Kolmogorov A. N., Tikhomirov V. M. (1959) e-entropy and e-capacity of sets in func- 
tion spaces, Uspekhi Mat. Nauk, 14 :2(86), 3-86 (in Russian) 

[16] Korostelev, A.P.,Tsybakov, A.B. (1992) Asymptotically minimax image reconstruction 
problems. In Topics in Nonparametric Estimation (R. Z. Khasminskii, ed.) 45-86. Amer. 
Math. Soc, Providence, RI 

[17] Korostelev, A.P.,Tsybakov, A.B. (1993) Minimax Theory of Image Reconstruction. 
Lecture Notes in Statistics, v. 82. Springer, NY e.a. 

[18] Korostelev, A. P., Tsybakov, A. B. (1994) Asymptotic efficiency in estimation of a 
convex set (in russian), Problems of Information Transmission, v. 30, n.4, 317-327 

[19] Lepski, O.V. (1991) Asymptotically minimax adaptive estimation i. upper bounds, 
optimally adaptive estimates. Theory Probab. Appl., 36 :682-697 

[20] Mammen, E., Tsybakov, A. (1995) Asymptotical Minimax Recovery of Sets with 
Smooth Boundaries, The Annals of Statistics, Vol. 23, No. 2, pp. 502-524 

[21] McClure D.E., Vitale R.A. (1975) Polygonal Approximation of Plane Convex Bodies, 
Journal of Mathematical Analysis and Applications, Vol. 51, No. 2 

[22] Pateiro Lopez B. (2008) Set estimation under convexity type restrictions, PhD thesis, 
available at |http ://eio.usc.es/pub/pateiro/files7T HESIS-BeatrizPateiroLopez.pdf 

[23] Reisner S., Schiitt C, Wener E. (2001) Dropping a vertex or a facet from a convex 
polytope, Forum Math., 359-378 



28 



[24] Renyi A., Sulanke R. (1963) Uber die konvexe Hiille von n zufallig gewiihlten Punkten. 
Z.Wahrscheinlichkeitsth. Verw. Geb. 2 pp. 75-84 

[25] Renyi A., Sulanke R. (1964) Uber die konvexe Hiille von n zufallig gewiihlten Punkten. 
II, Z.Wahrscheinlichkeitsth. Verw. Geb. 3 pp. 138-147 

[26] Tsybakov, A.B. (2009) Introduction to nonparametric estimation, Springer. 



29 



